Home Page

F. David Levenbach

"People who like this sort of thing will find this the sort of thing they like." (Lincoln)

POSC 4503

Introduction to Public Policy Studies

Comments on Anderson's ch 7

As always, unless otherwise noted, references to Anderson are to his 2011 7th edition assigned as a text for this course.

Very simply, systematic policy evaluation (Anderson: 277-78) is the activity of determining whether a policy (or a program) has caused the intended results. And that’s where the simple ends. Evaluation is all about making a causal statement, being able to confidently make the statement that the program caused the result. In other words, evaluation is testing the strength of a supposed cause-effect relationship between a program and certain results. In symbolic terms:

X --> Y

Three requirements must be satisfied to make a casual statement, or in public policy terms to say that a program caused particular outcomes. For illustration purposes, say the program is credit counseling and the policy goal is reduced credit card debt for a particular population--

X and Y must covary, which means that once the program is implemented, if there is no change in the debt of level of debt of program beneficiaries, compared, say, to those who did not get the credit counseling, then clearly the program has no effect of participants; if program participants have, again compared to those not in the program, lower debt (or even if they have higher debt), the first requirement is satisfied: X (the program) and Y (clients’ level of debt) move together.

Temporal priority, which means that X must occur in time before Y; if program participants’ debt goes down before they are recruited to and participate in the program, then that suggests that the results caused the program; it seems only logical that a program cannot be the cause of the results if the results occur prior to the program treatment.

Elimination of rival hypotheses, which means it is possible to rule out the operation of any other factor(s) as a cause of the results, that it is possible to defeat all competing arguments that this one so-called Z factor or another is not at play, something which would undermine our confidence that X –> Y; for example, if the clients of the credit counseling program volunteer for the program, it may be their motivation to reduce debt rather than the counseling that results in an improved balance sheet.

Covariation is easy to determine given some data, and temporal priority is generally not hard to resolve. It is in checking against the possible impact of other factors, other explanations for the apprarent X --> Y relationship (“the rival hypothesis” problem, the Z factor) where evaluation becomes exceptionally difficult. Surveying the work of others, Hoole (1978) elaborates 35 different “threats” to efforts to find that a program has a particular effect. Coping with the challenge of distinguishing real program effects from contaminating factors is what makes choosing an appropriate research design (Anderson: 278-79) so important.

The threats from competing explanations are heightened by the fact that most of the policies which we want to evaluate involve human behavior which is more complicated than the mechanics of things. It is relatively easy to evaluate the crash-worthiness of automobiles which are tested in a control environment using crash dummies. (Check out a Consumer Reports video which evaluates the performance of two Chevrolets in a head-to-head test, conclusively demonstrating the effects of improved automobile design and equipment. On the other hand, the test does not demonstrate exactly which of a bundle of design and equipment changes really make the difference. Quite possibly, one or more of the new features adds little or no protection. Moreover, note that the video, at the very end, makes an exceptional and unwarranted claim that the “roads are safer” because of the Insurance Institute for Highway Safety. If highway accidents or fatalities (and they are different things, implying different goals) are down, is this because of changes in automobile design and equipment or something else? Can it be explained by a rise in driver education programs? Raising the drinking age to 21 and tightening permissible blood alcohol limits? A trend toward SUVS or toward smaller, lighter cars? Better-designed highways?) Turning to people rather than machines--children are not crash dummies. The factors, aside from an intervention program like Head Start (Anderson: 297-301), that might effect children’s educational performance are many, indeed, and that makes the evaluation of Head Start a challenging task.

Anderson (272-276) identifies a number of aspects of importance in evaluation

Effects, intended & unintended
Impacted populations, those targeted for the policy & others who may experience positive or negative spill-over effects
The time-stream of effects, short-term v long-term
- This directs us to the “time-value of money,” which says that resources today are worth more than resources a year from now, the principle underlying the practice of discounting in benefit-cost analysis (Anderson: 292)
  - There is a political analogue here: government officials, especially those holding elective office, will, in comparing policy options, prefer those options which produce political benefits quickly and push political costs off into the future
- “In the long run we’re all dead,” but until then policies having long-term consequences, positive or negative, and thus policymaking requires forecasting, an analytic operation that can be as challenging as benefit-cost analysis and systematic evaluation; like systematic evaluation and benefit-cost analysis, forecasting requires a certain level of expertise, which gives policymaking influence to those who have the knowledge and skills to make projections
Costs, direct (governmental and nongovernmental), indirect (which, again, may be governmental and nongovernmental , and opportunity
Nature of benefits and costs, tangible v intangible, which really focuses on how easily program aspects can be measured

Evaluation is not simple, as Anderson points out (287-92)

Goal uncertainty (remember vagueness?)
Problems in making causal claims (discussed above)
Diffuse impacts because of spillover effects (Anderson: 273, 293) and the time-stream of effects (ibid.)
Availability of information, though there are two distinct problems here
- Having good-quality data on program costs and benefits (Anderson [292-93] talks about this in the context of benefit-cost analysis)
- Wanting to undertake evaluations of programs which will need time to develop but wanting to do so quickly, while programs are still open to change and before money is wasted
Resistance
- As Anderson (286) correctly notes, program operators are likely to be resistant to the efforts of evaluation researchers because negative findings threaten the agency’s mission commitments, its staffing levels, and operational funding. As an evaluation staffer at the Economic Development Administration once noted to me, there is also a psychological dimension here. As this informant observed, there is no nonjudgmental alternative term for evaluation—appraisal, assessment, testing, checking, all of these terms tend to inspire defensive reactions from those whose work is being inspected.
- While many agencies evaluate themselves (Anderson: 268) typically evaluation offices are administratively (and physically and psychologically) removed from the operational offices they are evaluating. Take the Department of Housing and Urban Development (HUD). The Program Evaluation Division reports to the HUD Assistant Secretary for Policy Development and Research. Operational offices are headed by other assistant secretaries, as can be seen in the overall department organizational chart; the distance between operators and evaluators makes for a difficult relationship.
Temporal “eyeshades”
- As noted just above in terms of the “availability of information,” political pressures may call for premature evaluations in that politicians will be eagerly looking for reasons to claim credit for programs they promoted or evidence to show failings in programs they opposed
  - The impact of politicians’ calendars on their behavior in the policymaking process was discussed earlier as the “Wimpy” factor
Evaluation results are often ignored because
- No one cares (for example, having gotten a policy adopted, interest groups and politicians may feel they have earned all the political credit they are going to get, and so move on to another issue)
- They involve such complicated material no one reads them: as evidence, Anderson (299) mentions the “methodological conflict” that emerged from the Westinghouse Head Start study and then says “an examination of these matters would be too lengthy and too technical to include here”
- Results are mixed, as in the case of the American Economic Review Head Start study mentioned by Anderson (300); in government programs, as in most of life, very few things are unqualified successes or unqualified failures
- Program supporters are so politically strong that evaluation results make no difference
- There are competing evaluations with different findings, making a simple conclusion hard to reach

Evaluation is commonly understood as a basis for deciding whether to continue and perhaps enlarge a program or, alternatively, to kill it (Anderson on termination: 301-304). Evaluation researchers would find this simplistic. More than a basis for a decision to “keep or kill,” people who specialize in systematic evaluation believe that the process is best understood as a foundation for considering policy changes. What kinds of changes?

Changes in budget
Changes in implementation practices
Changes in policy design (or formulation)
Changes in the definition of the problem

Seen in this light, policy evaluation is a point in a policy loop or cycle.

Odds & ends

While systematic program evaluation is a relatively new activity, process evaluations, though certainly not by that name, have a much longer history, going back for years in studies of public administration; process evaluations focus on patterns of implementation, patterns which may have substantive implications for program results assessed in systematic program evaluation. The discussion of food safety programs (Anderson: 283-84) is all about process evaluations. There is nothing in that discussion here that addresses the question of whether, for example, the FDA’s Egg Safety Action Plan is a cost-effective program.
A number of analytical procedures have been introduced into government since the 1960s. It sometimes seems as if every presidential administration has to come up with some government management reform gimmick. Lyndon Johnson had the Planning-Programming-Budget System (PPBS), Nixon had Management by Objective (MBO), and Carter had Zero-Based Budgeting (ZBB). Saying that these procedures have been “introduced” is pretty apt, because typically it’s a matter of shaking hands, making your acquaintance, and moving on, which is to say these reforms have never been really institutionalized (unlike forecasting, benefit-cost analysis, and systematic evaluation which, difficult as they are, an ingrained in government). Nonetheless, PPBS, MBO, ZBB, forecasting, benefit-cost analysis, and program evaluation are efforts to make policymaking more systematic. (But they still fall short both in principle and application from comprehensive rationality.) Systematic analysis, if it is really systematic, calls for experts who are trained in analytic techniques
- Then again the development of these analytic devices may have less to do with improving policymaking and more to do with political preferences; in other words, they may have a latent functions, as Anderson (295) suggests was the case in the order of the antiregulatory Reagan administration that regulations but put through a benefit-cost analysis
Anderson (279) writes that “evaluations can also be used for less laudable purposes.” Remember latent functions?
Here’s a thought: Many individuals and organizations are impacted by several programs at once. Children in low-income families may have the benefit of free school lunches, Medicaid services, and subsidized housing. Looking at certain outcomes for children—academic performance, educational persistence, avoidance of unwanted pregnancies, health measures, whatever—can we know that the nutritional benefits of a free school lunch account for better outcomes as opposed to being the product of one of the other programs. Similarly, if we are looking at corporate income or job-creation, to what should outcomes be attributed? Corporate tax policies, export promotion, import barriers, the Federal Reserve’s monetary policy, or another government policy.
And here’s a question: Anderson (291) discusses a policy for black sharecroppers that failed as agricultural policy but had other “significant, positive, long-term effects.” Does a program that fails its intended purpose but does achieve other good things qualify as a successful policy?

MC Escher Relativity

MC Escher, Relativity (1953)